Search CORE

379 research outputs found

RepFlow: Minimizing Flow Completion Times with Replicated Flows in Data Centers

Author: Li Baochun
Xu Hong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/12/2013
Field of study

Short TCP flows that are critical for many interactive applications in data centers are plagued by large flows and head-of-line blocking in switches. Hash-based load balancing schemes such as ECMP aggravate the matter and result in long-tailed flow completion times (FCT). Previous work on reducing FCT usually requires custom switch hardware and/or protocol changes. We propose RepFlow, a simple yet practically effective approach that replicates each short flow to reduce the completion times, without any change to switches or host kernels. With ECMP the original and replicated flows traverse distinct paths with different congestion levels, thereby reducing the probability of having long queueing delay. We develop a simple analytical model to demonstrate the potential improvement of RepFlow. Extensive NS-3 simulations and Mininet implementation show that RepFlow provides 50%--70% speedup in both mean and 99-th percentile FCT for all loads, and offers near-optimal FCT when used with DCTCP.Comment: To appear in IEEE INFOCOM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Dominant Resource Fairness in Cloud Computing Systems with Heterogeneous Servers

Author: Li Baochun
Liang Ben
Wang Wei
Publication venue
Publication date: 31/07/2013
Field of study

We study the multi-resource allocation problem in cloud computing systems where the resource pool is constructed from a large number of heterogeneous servers, representing different points in the configuration space of resources such as processing, memory, and storage. We design a multi-resource allocation mechanism, called DRFH, that generalizes the notion of Dominant Resource Fairness (DRF) from a single server to multiple heterogeneous servers. DRFH provides a number of highly desirable properties. With DRFH, no user prefers the allocation of another user; no one can improve its allocation without decreasing that of the others; and more importantly, no user has an incentive to lie about its resource demand. As a direct application, we design a simple heuristic that implements DRFH in real-world systems. Large-scale simulations driven by Google cluster traces show that DRFH significantly outperforms the traditional slot-based scheduler, leading to much higher resource utilization with substantially shorter job completion times

arXiv.org e-Print Archive

CiteSeerX

Crossref

rStream: Resilient and Optimal Peer-to-Peer Streaming with Rateless Codes

Author: Baochun Li
Chuan Wu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Designing Two-Dimensional Spectrum Auctions for Mobile Secondary Users

Author: Baochun Li
Yuefei Zhu
Zongpeng Li
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Nuclei: GPU-Accelerated Many-Core Network Coding

Author: Baochun Li
Hassan Shojania
Xin Wang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Abstract—While it is a well known result that network coding achieves optimal flow rates in multicast sessions, its potential for practical use has remained to be a question, due to its high computational complexity. Our previous work has attempted to design a hardware-accelerated and multi-threaded implementation of network coding to fully utilize multi-core CPUs, as well as SSE2 and AltiVec SIMD vector instructions on x86 and PowerPC processors. This paper represents another step forward, and presents the first attempt in the literature to maximize the performance of network coding by taking advantage of not only multi-core CPUs, but also potentially hundreds of computing cores in commodity off-the-shelf Graphics Processing Units (GPU). With GPU computing gaining momentum as a result of increased hardware capabilities and improved programmability, our work shows how the GPU, with a design involving thousands of lightweight threads, can boost network coding performance significantly. Many-core GPUs can be deployed as an attractive alternative and complementary solution to multi-core servers, by offering a better price/performance advantage. In fact, multicore CPUs and many-core GPUs can be deployed and used to perform network coding simultaneously, potentially useful in media streaming servers where hundreds of peers are served concurrently by these dedicated servers. In this paper, we present Nuclei, the design and implementation of GPU-based network coding. With Nuclei, only one mainstream NVidia 8800 GT GPU outperforms an 8-core Intel Xeon server in most test cases. A combined CPU-GPU encoding scenario achieves coding rates of up to 116 MB/second for a variety of coding settings, which is sufficient to saturate a Gigabit Ethernet interface. Index Terms—Network coding, Many-core GPU computing. I

CiteSeerX

Crossref

A Constant Bound on Throughput Improvement of Multicast Network Coding in Undirected Networks

Author: Baochun Li
Lap Chi Lau
Zongpeng Li
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref